Structure and Semantics of Data-IntensiveWeb Pages: An Experimental Study on their Relationships

نویسندگان

  • Lorenzo Blanco
  • Valter Crescenzi
  • Paolo Merialdo
چکیده

In data-intensive web sites pages are generated by scripts that embed data from a backend database into HTML templates. There is usually a relationship between the semantics of the data in a page and its corresponding template. For example, in a web site about sports events, it is likely that pages with data about athletes are associated with a template that differs from the template used to generate pages about coaches or referees. This article presents a method to classify web pages according to the associated template. Given a web page, the goal of our method is to accurately find the pages that are about the same topic. Our method leverages on a simple, yet effective model to abstract some structural features of a web page. We present the results of an extensive experimental analysis that show the performance of our methods in terms of both recall and precision regarding a large number of real-world web pages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A duality between LM-fuzzy possibility computations and their logical semantics

Let X be a dcpo and let L be a complete lattice. The family σL(X) of all Scott continuous mappings from X to L is a complete lattice under pointwise order, we call it the L-fuzzy Scott structure on X. Let E be a dcpo. A mapping g : σL(E) −> M is called an LM-fuzzy possibility valuation of E if it preserves arbitrary unions. Denote by πLM(E) the set of all LM-fuzzy possibility valuations of E. T...

متن کامل

Reverse Engineering of Network Software Binary Codes for Identification of Syntax and Semantics of Protocol Messages

Reverse engineering of network applications especially from the security point of view is of high importance and interest. Many network applications use proprietary protocols which specifications are not publicly available. Reverse engineering of such applications could provide us with vital information to understand their embedded unknown protocols. This could facilitate many tasks including d...

متن کامل

An assessment of the current status of the organizational structure and its dimensions in medical universities for transition to the third generation university

Background: In today's competitive world, it is hardly possible to achieve strategic goals without having a well-structured organization. Therefore, universities need to focus on improving their organizational structure in order to achieve their goals and sustain their activities. The purpose of this study was to assess the current status of organizational structure dimensions in universities o...

متن کامل

A Discourse Analysis of “The Prince and His Companions” in Kelileh and Demneh Based On Semio-Semantics

Despite showing an overtly simple structure, the semantic process in classic literary-narrative discourse conforms to complicated semiotic systems. As a result, semio-semantics is deemed as one of the most scientific, reliable tools since it helps intradiscursive semio-textual propositions be phenomenologically, and even epistemologically, analyzed. Consequently, the narrative discourse in “The...

متن کامل

اثر جایگاه منو بر توجه بصری کاریران وب سایت‌ها

Objective: In order to identify users’ visual attention to left- and right-aligned menus on web pages, fixation count index (FCI) was assessed for both left and right menus using eye tracker to determine which menu is preferred by users in terms of visual attention. Methodology: In total, 116 pages with their menus aligned to left or right, classified into three groups, namely Persian pages, En...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. UCS

دوره 14  شماره 

صفحات  -

تاریخ انتشار 2008